Search CORE

arXiv.org e-Print Archive

FigShare

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

Author: DeGiorgio Michael
Hellmann Ines
Huber Christian D.
Hubisz Melissa J.
Nielsen Rasmus
Publication venue
Publication date: 26/05/2015
Field of study

SweepFinder is a popular program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection, as well as increased flexibility that enables the user to examine genomic regions in greater detail and to specify a fixed distance between test sites. Moreover, SweepFinder2 enables the use of invariant sites for sweep detection, increasing both its power and precision relative to SweepFinder

Public Library of Science (PLOS)

Error and Error Mitigation in Low-Coverage Genome Assemblies

Author: Hubisz Melissa J.
Kellis Manolis
Lin Michael F.
Siepel Adam
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2010
Field of study

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ~2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.National Science Foundation (U.S.) (Faculty Early Career Development grant DBI-0644111)National Science Foundation (U.S.) (Faculty Early Career Development grant DBI-0644282)National Science Foundation (U.S.) (Faculty Early Career Development grant U54 HG004555-01)David & Lucile Packard FoundationDavid & Lucile Packard Foundation (Fellowship for Science and Engineering

CiteSeerX

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

Public Library of Science (PLOS)

Recommended from our members

Patterns of Positive Selection in Six Mammalian Genomes

Author: Bustamante Carlos D.
Fonseca Rute R. da
Hubisz Melissa J.
Kosiol Carolin
Nielsen Rasmus
Siepel Adam
Vinař Tomáš
Publication venue
Publication date: 03/01/2024
Field of study

Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of ∼16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible “selection histories” of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.</p

Knowledge UChicago

Localizing Recent Adaptive Evolution in the Human Genome

Author: Andrew G Clark
Bret A Payseur
Carlos D Bustamante
Gil McVean
Melissa J Hubisz
Rasmus Nielsen
Scott H Williamson
The International HapMap Consortium
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10−5) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep

CiteSeerX

Copenhagen University Research Information System

arXiv.org e-Print Archive

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

Author: A Auton
A Kong
A Navarro
A Necşulea
A Ratnakumar
A Siepel
Adam Siepel
AJ Jeffreys
AJ Webb
AP Boyle
BC Lamb
C Kosiol
CC Spencer
CF Mugal
D Karolchik
D Kostka
Dennis Kostka
E Mancera
G Marais
Graham Coop
J Berglund
J Harrow
J Romiguier
JA Capra
JM Chen
John A. Capra
JW IJdo
K Lindblad-Toh
K Pollard
Katherine S. Pollard
L Arbiza
L Duret
L Duret
LR Meyer
M Blanchette
M Hasegawa
Melissa J. Hubisz
MJ Hubisz
N Galtier
N Galtier
N Lartillot
P Flicek
P Stenson
RD George
S Glémin
S Katzman
S Katzman
S Myers
S Myers
SE Ptak
ST Sherry
T Nagylaki
TC Brown
TR Dreszer
W Winckler
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

Cold Spring Harbor Laboratory Institutional Repository

D-Scholarship@Pitt

FigShare

Tracking genes and finding mutations: finding genes for complex traits in the domestic dog (Canis familiaris)

Author: Auton Adam
Boyko Adam
Brisbin Abra
Bustamante Carlos D
Cargill Michelle
Degenhardt Jeremiah D
Elkahloun Abdel G
Hubisz Melissa J
Li Lin
Lohmueller Kurt E
Mosher Dana S
Novembre John
Ostrander Elaine A
Parker Heidi G
Quignon Pascale
Reynolds Andy
Schoenebeck Jeffrey J
Siepel Adam
Sutter Nathan B
Von Holdt Bridgett M
Wayne Robert K
Zhao Keyan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Recommended from our members

A high-resolution map of human evolutionary constraint using 29 mammals.

Author: Alföldi Jessica
Baldwin Jen
Baylor College of Medicine Human Genome Sequencing Center Sequencing Team
Beal Kathryn
Birney Ewan
Bloom Toby
Broad Institute Sequencing Platform and Whole Genome Assembly Team
Chang Jean
Chin Chee Whye
Clamp Michele
Clawson Hiram
Cree Andrew
Cuff James
Delehaunty Kim
Di Palma Federica
Dihn Huyen H
Dooling David
Ernst Jason
Fitzgerald Stephen
Flicek Paul
Fowler Gerald
Fronik Catrina
Fulton Bob
Fulton Lucinda
Garber Manuel
Genome Institute at Washington University
Gibbs Richard A
Gnerre Sante
Goldman Nick
Graves Tina
Green Eric D
Guttman Mitchell
Haussler David
Heiman Dave
Herrero Javier
Holloway Alisha K
Hubisz Melissa J
Jaffe David B
Jhangiani Shalili
Jordan Gregory
Joshi Vandita
Jungreis Irwin
Kellis Manolis
Kent W James
Kheradpour Pouya
Kostka Dennis
Kovar Christie L
Lander Eric S
Lara Marcia
Lee Sandra
Lewis Lora R
Lin Michael F
Lindblad-Toh Kerstin
Lowe Craig B
Mardis Elaine R
Margulies Elliott H
Martins Andre L
Massingham Tim
Mauceli Evan
Minx Patrick
Moltke Ida
Muzny Donna M
Nazareth Lynne V
Nicol Robert
Nusbaum Chad
Okwuonu Geoffrey
Parker Brian J
Pedersen Jakob S
Pollard Katherine S
Raney Brian J
Rasmussen Matthew D
Robinson Jim
Santibanez Jireh
Siepel Adam
Sodergren Erica
Stark Alexander
Vilella Albert J
Ward Lucas D
Warren Wesley C
Washietl Stefan
Weinstock George M
Wen Jiayu
Wilkinson Jane
Wilson Richard K
Worley Kim C
Xie Xiaohui
Young Sarah
Zody Michael C
Zuk Or
Publication venue: eScholarship, University of California
Publication date: 01/10/2011
Field of study

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease

A High-Resolution Map of Human Evolutionary Constraint Using 29 Mammals

Author: A Keinan
A Siepel
A Siepel
A Stark
Adam Siepel
Albert J. Vilella
Alexander Stark
Alisha K. Holloway
Andre L. Martins
Brian J. Parker
Brian J. Raney
CB Lowe
Christie L. Kovar
Craig B. Lowe
D Altshuler
D Baek
D Pillas
D Schmidt
David B. Jaffe
David Haussler
Dennis Kostka
Donna M. Muzny
Elaine R. Mardis
Elliott H. Margulies
Eric D. Green
Eric S. Lander
ES Lander
ET Wang
EV Davydov
Evan Mauceli
Ewan Birney
F Chiaromonte
Federica Di Palma
G Bejerano
Genome 10K Community Of Scientists
George M. Weinstock
GM Cooper
Gregory Jordan
Hiram Clawson
Ida Moltke
Irwin Jungreis
J Ernst
J Ernst
J Harrow
JA Drake
Jakob S. Pedersen
James Cuff
Jason Ernst
Javier Herrero
Jean Chang
Jessica Alföldi
Jiayu Wen
Jim Robinson
JS Pedersen
JT Lee
JW Thomas
K Lindblad-Toh
Katherine S. Pollard
Kathryn Beal
KD Pruitt
Kerstin Lindblad-Toh
Kim C. Worley
KS Pollard
Lucas D. Ward
M Clamp
M Garber
M Guttman
M Kellis
Manolis Kellis
Manuel Garber
Marcia Lara
Maria L. Martínez-Chantar
Matthew D. Rasmussen
Melissa J. Hubisz
MF Lin
MF Lin
Michael C. Zody
Michael F. Lin
Michele Clamp
Mitchell Guttman
MJ Hubisz
Nick Goldman
Or Zuk
P Kheradpour
Paul Flicek
Pouya Kheradpour
RA Gibbs
RH Waterston
Richard A. Gibbs
Richard K. Wilson
S Gnerre
S Maenner
S Meader
S Prabhakar
S Tumpel
S Washietl
Sante Gnerre
Stefan Washietl
Stephen Fitzgerald
Tim Massingham
TS Mikkelsen
W. James Kent
Wesley C. Warren
X Lampe
X Xie
Xiaohui Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.National Human Genome Research Institute (U.S.)National Institute of General Medical Sciences (U.S.) (Grant number GM82901)National Science Foundation (U.S.). Postdoctural Fellowship (Award 0905968)National Science Foundation (U.S.). Career (0644282)National Institutes of Health (U.S.) (R01-HG004037)Alfred P. Sloan Foundation.Austrian Science Fund. Erwin Schrodinger Fellowshi

DSpace@MIT